Overview

Dataset statistics

Number of variables17
Number of observations9471
Missing cells20652
Missing cells (%)12.8%
Duplicate rows113
Duplicate rows (%)1.2%
Total size in memory1.2 MiB
Average record size in memory136.0 B

Variable types

NUM13
CAT2
UNSUPPORTED2

Warnings

Dataset has 113 (1.2%) duplicate rows Duplicates
Date has a high cardinality: 391 distinct values High cardinality
PT08.S2(NMHC) is highly correlated with PT08.S1(CO) and 1 other fieldsHigh correlation
PT08.S1(CO) is highly correlated with PT08.S2(NMHC)High correlation
PT08.S5(O3) is highly correlated with PT08.S2(NMHC)High correlation
T is highly correlated with C6H6(GT) and 1 other fieldsHigh correlation
C6H6(GT) is highly correlated with T and 2 other fieldsHigh correlation
RH is highly correlated with C6H6(GT) and 1 other fieldsHigh correlation
AH is highly correlated with C6H6(GT) and 2 other fieldsHigh correlation
Date has 114 (1.2%) missing values Missing
Time has 114 (1.2%) missing values Missing
CO(GT) has 114 (1.2%) missing values Missing
PT08.S1(CO) has 114 (1.2%) missing values Missing
NMHC(GT) has 114 (1.2%) missing values Missing
C6H6(GT) has 114 (1.2%) missing values Missing
PT08.S2(NMHC) has 114 (1.2%) missing values Missing
NOx(GT) has 114 (1.2%) missing values Missing
PT08.S3(NOx) has 114 (1.2%) missing values Missing
NO2(GT) has 114 (1.2%) missing values Missing
PT08.S4(NO2) has 114 (1.2%) missing values Missing
PT08.S5(O3) has 114 (1.2%) missing values Missing
T has 114 (1.2%) missing values Missing
RH has 114 (1.2%) missing values Missing
AH has 114 (1.2%) missing values Missing
Unnamed: 15 has 9471 (100.0%) missing values Missing
Unnamed: 16 has 9471 (100.0%) missing values Missing
Date is uniformly distributed Uniform
Time is uniformly distributed Uniform
Unnamed: 15 is an unsupported type, check if it needs cleaning or further analysis Unsupported
Unnamed: 16 is an unsupported type, check if it needs cleaning or further analysis Unsupported

Reproduction

Analysis started2020-11-19 00:49:07.342038
Analysis finished2020-11-19 00:49:40.965110
Duration33.62 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

Date
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct391
Distinct (%)4.2%
Missing114
Missing (%)1.2%
Memory size74.0 KiB
2005-04-02
 
24
2004-05-17
 
24
2004-05-26
 
24
2004-08-13
 
24
2005-01-27
 
24
Other values (386)
9237 
ValueCountFrequency (%) 
2005-04-02240.3%
 
2004-05-17240.3%
 
2004-05-26240.3%
 
2004-08-13240.3%
 
2005-01-27240.3%
 
2004-04-19240.3%
 
2004-12-08240.3%
 
2004-06-07240.3%
 
2004-10-28240.3%
 
2005-02-17240.3%
 
Other values (381)911796.3%
 
(Missing)1141.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length10
Median length10
Mean length9.915742794
Min length3

Time
Categorical

MISSING
UNIFORM

Distinct24
Distinct (%)0.3%
Missing114
Missing (%)1.2%
Memory size74.0 KiB
18:00:00
 
390
10:00:00
 
390
11:00:00
 
390
8:00:00
 
390
4:00:00
 
390
Other values (19)
7407 
ValueCountFrequency (%) 
18:00:003904.1%
 
10:00:003904.1%
 
11:00:003904.1%
 
8:00:003904.1%
 
4:00:003904.1%
 
9:00:003904.1%
 
3:00:003904.1%
 
12:00:003904.1%
 
23:00:003904.1%
 
20:00:003904.1%
 
Other values (14)545757.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length8
Median length8
Mean length7.528032943
Min length3

CO(GT)
Real number (ℝ)

MISSING

Distinct97
Distinct (%)1.0%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean-34.20752378
Minimum-200
Maximum11.9
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile-200
Q10.6
median1.5
Q32.6
95-th percentile4.7
Maximum11.9
Range211.9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation77.65717035
Coefficient of variation (CV)-2.270178071
Kurtosis0.7783055185
Mean-34.20752378
Median Absolute Deviation (MAD)1
Skewness-1.666179502
Sum-320079.8
Variance6030.636106
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-200168317.8%
 
13053.2%
 
1.42792.9%
 
1.62752.9%
 
1.52732.9%
 
1.12622.8%
 
0.72602.7%
 
1.72582.7%
 
1.32532.7%
 
0.82512.7%
 
Other values (87)525855.5%
 
ValueCountFrequency (%) 
-200168317.8%
 
0.1330.3%
 
0.2450.5%
 
0.3981.0%
 
0.41601.7%
 
ValueCountFrequency (%) 
11.91< 0.1%
 
11.51< 0.1%
 
10.22< 0.1%
 
10.11< 0.1%
 
9.91< 0.1%
 

PT08.S1(CO)
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct1042
Distinct (%)11.1%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean1048.990061
Minimum-200
Maximum2040
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile746
Q1921
median1053
Q31221
95-th percentile1502
Maximum2040
Range2240
Interquartile range (IQR)300

Descriptive statistics

Standard deviation329.8327099
Coefficient of variation (CV)0.3144288227
Kurtosis5.836935683
Mean1048.990061
Median Absolute Deviation (MAD)147
Skewness-1.721503448
Sum9815400
Variance108789.6165
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
973300.3%
 
1100280.3%
 
969260.3%
 
938260.3%
 
988260.3%
 
925260.3%
 
970250.3%
 
987250.3%
 
984250.3%
 
Other values (1032)875492.4%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
6471< 0.1%
 
6491< 0.1%
 
6551< 0.1%
 
6673< 0.1%
 
ValueCountFrequency (%) 
20401< 0.1%
 
20081< 0.1%
 
19821< 0.1%
 
19751< 0.1%
 
19731< 0.1%
 

NMHC(GT)
Real number (ℝ)

MISSING

Distinct430
Distinct (%)4.6%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean-159.090093
Minimum-200
Maximum1189
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile-200
Q1-200
median-200
Q3-200
95-th percentile144.2
Maximum1189
Range1389
Interquartile range (IQR)0

Descriptive statistics

Standard deviation139.7890929
Coefficient of variation (CV)-0.8786788057
Kurtosis18.86382399
Mean-159.090093
Median Absolute Deviation (MAD)0
Skewness4.075784452
Sum-1488606
Variance19540.99049
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-200844389.1%
 
66140.1%
 
2990.1%
 
4090.1%
 
8880.1%
 
9380.1%
 
5770.1%
 
5570.1%
 
9570.1%
 
8470.1%
 
Other values (420)8388.8%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-200844389.1%
 
71< 0.1%
 
81< 0.1%
 
91< 0.1%
 
101< 0.1%
 
ValueCountFrequency (%) 
11891< 0.1%
 
11291< 0.1%
 
10841< 0.1%
 
10421< 0.1%
 
9741< 0.1%
 

C6H6(GT)
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct408
Distinct (%)4.4%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean1.865683446
Minimum-200
Maximum63.7
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile0.7
Q14
median7.9
Q313.6
95-th percentile24.42
Maximum63.7
Range263.7
Interquartile range (IQR)9.6

Descriptive statistics

Standard deviation41.38020644
Coefficient of variation (CV)22.17965032
Kurtosis19.18865057
Mean1.865683446
Median Absolute Deviation (MAD)4.5
Skewness-4.508762883
Sum17457.2
Variance1712.321485
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
3.6840.9%
 
2.8820.9%
 
3.8790.8%
 
4780.8%
 
3.1770.8%
 
3760.8%
 
2.5750.8%
 
2.9730.8%
 
5.4720.8%
 
Other values (398)829587.6%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
0.12< 0.1%
 
0.280.1%
 
0.3100.1%
 
0.4140.1%
 
ValueCountFrequency (%) 
63.71< 0.1%
 
52.11< 0.1%
 
50.81< 0.1%
 
50.71< 0.1%
 
50.61< 0.1%
 

PT08.S2(NMHC)
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct1246
Distinct (%)13.3%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean894.5952763
Minimum-200
Maximum2214
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile471
Q1711
median895
Q31105
95-th percentile1415
Maximum2214
Range2414
Interquartile range (IQR)394

Descriptive statistics

Standard deviation342.3332516
Coefficient of variation (CV)0.3826682979
Kurtosis2.370088799
Mean894.5952763
Median Absolute Deviation (MAD)195
Skewness-0.7934346434
Sum8370728
Variance117192.0552
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
853250.3%
 
880230.2%
 
800230.2%
 
859230.2%
 
985220.2%
 
850210.2%
 
783210.2%
 
769210.2%
 
776210.2%
 
Other values (1236)879192.8%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
3832< 0.1%
 
3871< 0.1%
 
3881< 0.1%
 
3902< 0.1%
 
ValueCountFrequency (%) 
22141< 0.1%
 
20071< 0.1%
 
19831< 0.1%
 
19811< 0.1%
 
19801< 0.1%
 

NOx(GT)
Real number (ℝ)

MISSING

Distinct926
Distinct (%)9.9%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean168.6169713
Minimum-200
Maximum1479
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile-200
Q150
median141
Q3284
95-th percentile653.2
Maximum1479
Range1679
Interquartile range (IQR)234

Descriptive statistics

Standard deviation257.4338663
Coefficient of variation (CV)1.526737578
Kurtosis1.505417097
Mean168.6169713
Median Absolute Deviation (MAD)109
Skewness0.8252321889
Sum1577749
Variance66272.19551
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-200163917.3%
 
89410.4%
 
65370.4%
 
41360.4%
 
122360.4%
 
93360.4%
 
180350.4%
 
132350.4%
 
95350.4%
 
51340.4%
 
Other values (916)739378.1%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-200163917.3%
 
21< 0.1%
 
41< 0.1%
 
61< 0.1%
 
71< 0.1%
 
ValueCountFrequency (%) 
14791< 0.1%
 
13892< 0.1%
 
13691< 0.1%
 
13581< 0.1%
 
13451< 0.1%
 

PT08.S3(NOx)
Real number (ℝ)

MISSING

Distinct1222
Distinct (%)13.1%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean794.9901678
Minimum-200
Maximum2683
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile410
Q1637
median794
Q3960
95-th percentile1281.2
Maximum2683
Range2883
Interquartile range (IQR)323

Descriptive statistics

Standard deviation321.9935516
Coefficient of variation (CV)0.4050283446
Kurtosis3.104825915
Mean794.9901678
Median Absolute Deviation (MAD)161
Skewness-0.3847597666
Sum7438723
Variance103679.8473
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
767250.3%
 
733250.3%
 
846250.3%
 
765230.2%
 
876230.2%
 
845220.2%
 
800220.2%
 
872220.2%
 
816220.2%
 
Other values (1212)878292.7%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
3221< 0.1%
 
3252< 0.1%
 
3281< 0.1%
 
3302< 0.1%
 
ValueCountFrequency (%) 
26831< 0.1%
 
25591< 0.1%
 
25421< 0.1%
 
23311< 0.1%
 
23271< 0.1%
 

NO2(GT)
Real number (ℝ)

MISSING

Distinct284
Distinct (%)3.0%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean58.1488725
Minimum-200
Maximum340
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile-200
Q153
median96
Q3133
95-th percentile194
Maximum340
Range540
Interquartile range (IQR)80

Descriptive statistics

Standard deviation126.9404553
Coefficient of variation (CV)2.183025221
Kurtosis0.2755990718
Mean58.1488725
Median Absolute Deviation (MAD)40
Skewness-1.22562964
Sum544099
Variance16113.87918
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-200164217.3%
 
97780.8%
 
119770.8%
 
117770.8%
 
114750.8%
 
101750.8%
 
95750.8%
 
110740.8%
 
115730.8%
 
107720.8%
 
Other values (274)703974.3%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-200164217.3%
 
21< 0.1%
 
31< 0.1%
 
52< 0.1%
 
71< 0.1%
 
ValueCountFrequency (%) 
3401< 0.1%
 
3331< 0.1%
 
3261< 0.1%
 
3221< 0.1%
 
3121< 0.1%
 

PT08.S4(NO2)
Real number (ℝ)

MISSING

Distinct1604
Distinct (%)17.1%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean1391.479641
Minimum-200
Maximum2775
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile757
Q11185
median1446
Q31662
95-th percentile2020.2
Maximum2775
Range2975
Interquartile range (IQR)477

Descriptive statistics

Standard deviation467.2101246
Coefficient of variation (CV)0.3357649734
Kurtosis3.267027856
Mean1391.479641
Median Absolute Deviation (MAD)236
Skewness-1.244109947
Sum13020075
Variance218285.3005
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
1488240.3%
 
1580220.2%
 
1539210.2%
 
1467200.2%
 
1638190.2%
 
1490180.2%
 
1418180.2%
 
1570170.2%
 
1473170.2%
 
Other values (1594)881593.1%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
5511< 0.1%
 
5591< 0.1%
 
5611< 0.1%
 
5791< 0.1%
 
ValueCountFrequency (%) 
27751< 0.1%
 
27461< 0.1%
 
26911< 0.1%
 
26841< 0.1%
 
26791< 0.1%
 

PT08.S5(O3)
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct1744
Distinct (%)18.6%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean975.0720316
Minimum-200
Maximum2523
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile348
Q1700
median942
Q31255
95-th percentile1750
Maximum2523
Range2723
Interquartile range (IQR)555

Descriptive statistics

Standard deviation456.9381845
Coefficient of variation (CV)0.4686199272
Kurtosis0.6382966399
Mean975.0720316
Median Absolute Deviation (MAD)272
Skewness-0.03466187982
Sum9123749
Variance208792.5044
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
836200.2%
 
825200.2%
 
826190.2%
 
926180.2%
 
799170.2%
 
777170.2%
 
923160.2%
 
905160.2%
 
891160.2%
 
Other values (1734)883293.3%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
2211< 0.1%
 
2251< 0.1%
 
2271< 0.1%
 
2321< 0.1%
 
ValueCountFrequency (%) 
25231< 0.1%
 
25221< 0.1%
 
25191< 0.1%
 
25151< 0.1%
 
24941< 0.1%
 

T
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct437
Distinct (%)4.7%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean9.778305012
Minimum-200
Maximum44.6
Zeros1
Zeros (%)< 0.1%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile2.5
Q110.9
median17.2
Q324.1
95-th percentile34.3
Maximum44.6
Range244.6
Interquartile range (IQR)13.2

Descriptive statistics

Standard deviation43.20362306
Coefficient of variation (CV)4.418314116
Kurtosis18.77480657
Mean9.778305012
Median Absolute Deviation (MAD)6.6
Skewness-4.445467033
Sum91495.6
Variance1866.553046
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
20.8570.6%
 
21.3540.6%
 
20.2510.5%
 
13.8510.5%
 
12490.5%
 
15.6490.5%
 
12.3490.5%
 
16.3480.5%
 
19.8480.5%
 
Other values (427)853590.1%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
-1.91< 0.1%
 
-1.41< 0.1%
 
-1.32< 0.1%
 
-1.21< 0.1%
 
ValueCountFrequency (%) 
44.61< 0.1%
 
44.31< 0.1%
 
43.41< 0.1%
 
43.11< 0.1%
 
42.83< 0.1%
 

RH
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct754
Distinct (%)8.1%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean39.48537993
Minimum-200
Maximum88.7
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile15
Q134.1
median48.6
Q361.9
95-th percentile77.6
Maximum88.7
Range288.7
Interquartile range (IQR)27.8

Descriptive statistics

Standard deviation51.21614497
Coefficient of variation (CV)1.297091355
Kurtosis15.76415389
Mean39.48537993
Median Absolute Deviation (MAD)13.9
Skewness-3.932407357
Sum369464.7
Variance2623.093506
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
53.1310.3%
 
57.9300.3%
 
47.8300.3%
 
45.9270.3%
 
60.8270.3%
 
50.1260.3%
 
47.6260.3%
 
50.9260.3%
 
57.6260.3%
 
Other values (744)874292.3%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
9.22< 0.1%
 
9.31< 0.1%
 
9.61< 0.1%
 
9.81< 0.1%
 
ValueCountFrequency (%) 
88.71< 0.1%
 
87.21< 0.1%
 
87.11< 0.1%
 
871< 0.1%
 
86.62< 0.1%
 

AH
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct6684
Distinct (%)71.4%
Missing114
Missing (%)1.2%
Infinite0
Infinite (%)0.0%
Mean-6.837603644
Minimum-200
Maximum2.231
Zeros0
Zeros (%)0.0%
Memory size74.0 KiB

Quantile statistics

Minimum-200
5-th percentile0.29506
Q10.6923
median0.9768
Q31.2962
95-th percentile1.72044
Maximum2.231
Range202.231
Interquartile range (IQR)0.6039

Descriptive statistics

Standard deviation38.97667017
Coefficient of variation (CV)-5.700340674
Kurtosis20.61309172
Mean-6.837603644
Median Absolute Deviation (MAD)0.3022
Skewness-4.75457029
Sum-63979.4573
Variance1519.180817
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
-2003663.9%
 
1.119960.1%
 
0.839460.1%
 
0.968460.1%
 
0.748760.1%
 
0.972260.1%
 
0.873650.1%
 
0.927150.1%
 
0.832550.1%
 
0.668650.1%
 
Other values (6674)894194.4%
 
(Missing)1141.2%
 
ValueCountFrequency (%) 
-2003663.9%
 
0.18471< 0.1%
 
0.18621< 0.1%
 
0.1911< 0.1%
 
0.19751< 0.1%
 
ValueCountFrequency (%) 
2.2311< 0.1%
 
2.18061< 0.1%
 
2.17661< 0.1%
 
2.17191< 0.1%
 
2.13951< 0.1%
 

Unnamed: 15
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing9471
Missing (%)100.0%
Memory size74.1 KiB

Unnamed: 16
Unsupported

MISSING
REJECTED
UNSUPPORTED

Missing9471
Missing (%)100.0%
Memory size74.1 KiB

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

DateTimeCO(GT)PT08.S1(CO)NMHC(GT)C6H6(GT)PT08.S2(NMHC)NOx(GT)PT08.S3(NOx)NO2(GT)PT08.S4(NO2)PT08.S5(O3)TRHAHUnnamed: 15Unnamed: 16
02004-03-1018:00:002.61360.0150.011.91046.0166.01056.0113.01692.01268.013.648.90.7578NaNNaN
12004-03-1019:00:002.01292.0112.09.4955.0103.01174.092.01559.0972.013.347.70.7255NaNNaN
22004-03-1020:00:002.21402.088.09.0939.0131.01140.0114.01555.01074.011.954.00.7502NaNNaN
32004-03-1021:00:002.21376.080.09.2948.0172.01092.0122.01584.01203.011.060.00.7867NaNNaN
42004-03-1022:00:001.61272.051.06.5836.0131.01205.0116.01490.01110.011.259.60.7888NaNNaN
52004-03-1023:00:001.21197.038.04.7750.089.01337.096.01393.0949.011.259.20.7848NaNNaN
62004-03-110:00:001.21185.031.03.6690.062.01462.077.01333.0733.011.356.80.7603NaNNaN
72004-03-111:00:001.01136.031.03.3672.062.01453.076.01333.0730.010.760.00.7702NaNNaN
82004-03-112:00:000.91094.024.02.3609.045.01579.060.01276.0620.010.759.70.7648NaNNaN
92004-03-113:00:000.61010.019.01.7561.0-200.01705.0-200.01235.0501.010.360.20.7517NaNNaN

Last rows

DateTimeCO(GT)PT08.S1(CO)NMHC(GT)C6H6(GT)PT08.S2(NMHC)NOx(GT)PT08.S3(NOx)NO2(GT)PT08.S4(NO2)PT08.S5(O3)TRHAHUnnamed: 15Unnamed: 16
9461NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9462NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9463NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9464NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9465NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9466NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9467NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9468NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9469NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
9470NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN